1 Executive Summary

Aim of this report - to investigate trends in the Australian weather data from 2007-2017 and discuss the following research questions:

  • Where is the most optimal location for agricultural production in Australia?
  • How does temperature vary in different climates across Australia?
  • Where is the ideal location for generating renewable energy in Australia?


Main discoveries:

  • Darwin and Cairns are the most optimal locations for agricultural production in Australia
  • There are distinct temperature trends for temperate sub-tropical, hot desert and tropical savanna climates
  • Woomera and Darwin are the most ideal location for generating wind and solar renewable energy in Australia



2 Initial Data Analysis (IDA)

2.1 A Glimpse of the Data Set

# Loading weather data from local .csv file
weather = read.csv("data/weatherAUS.csv")

# Quick look at top 6 rows of data
kable(head(weather), "html") %>%
    kable_styling(bootstrap_options = c("striped", "hover")) %>%
    scroll_box(width = "100%")
Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed WindDir9am WindDir3pm WindSpeed9am WindSpeed3pm Humidity9am Humidity3pm Pressure9am Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm RainToday RISK_MM RainTomorrow
2008-12-01 Albury 13.4 22.9 0.6 NA NA W 44 W WNW 20 24 71 22 1007.7 1007.1 8 NA 16.9 21.8 No 0.0 No
2008-12-02 Albury 7.4 25.1 0.0 NA NA WNW 44 NNW WSW 4 22 44 25 1010.6 1007.8 NA NA 17.2 24.3 No 0.0 No
2008-12-03 Albury 12.9 25.7 0.0 NA NA WSW 46 W WSW 19 26 38 30 1007.6 1008.7 NA 2 21.0 23.2 No 0.0 No
2008-12-04 Albury 9.2 28.0 0.0 NA NA NE 24 SE E 11 9 45 16 1017.6 1012.8 NA NA 18.1 26.5 No 1.0 No
2008-12-05 Albury 17.5 32.3 1.0 NA NA W 41 ENE NW 7 20 82 33 1010.8 1006.0 7 8 17.8 29.7 No 0.2 No
2008-12-06 Albury 14.6 29.7 0.2 NA NA WNW 56 W W 19 24 55 23 1009.2 1005.4 NA NA 20.6 28.9 No 0.0 No


2.2 Assessing R’s Classication of the Variables

# Size of the data and R's classification of the variables
str(weather)
## 'data.frame':    142193 obs. of  24 variables:
##  $ Date         : Factor w/ 3436 levels "2007-11-01","2007-11-02",..: 397 398 399 400 401 402 403 404 405 406 ...
##  $ Location     : Factor w/ 49 levels "Adelaide","Albany",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ MinTemp      : num  13.4 7.4 12.9 9.2 17.5 14.6 14.3 7.7 9.7 13.1 ...
##  $ MaxTemp      : num  22.9 25.1 25.7 28 32.3 29.7 25 26.7 31.9 30.1 ...
##  $ Rainfall     : num  0.6 0 0 0 1 0.2 0 0 0 1.4 ...
##  $ Evaporation  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Sunshine     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ WindGustDir  : Factor w/ 16 levels "E","ENE","ESE",..: 14 15 16 5 14 15 14 14 7 14 ...
##  $ WindGustSpeed: int  44 44 46 24 41 56 50 35 80 28 ...
##  $ WindDir9am   : Factor w/ 16 levels "E","ENE","ESE",..: 14 7 14 10 2 14 13 11 10 9 ...
##  $ WindDir3pm   : Factor w/ 16 levels "E","ENE","ESE",..: 15 16 16 1 8 14 14 14 8 11 ...
##  $ WindSpeed9am : int  20 4 19 11 7 19 20 6 7 15 ...
##  $ WindSpeed3pm : int  24 22 26 9 20 24 24 17 28 11 ...
##  $ Humidity9am  : int  71 44 38 45 82 55 49 48 42 58 ...
##  $ Humidity3pm  : int  22 25 30 16 33 23 19 19 9 27 ...
##  $ Pressure9am  : num  1008 1011 1008 1018 1011 ...
##  $ Pressure3pm  : num  1007 1008 1009 1013 1006 ...
##  $ Cloud9am     : int  8 NA NA NA 7 NA 1 NA NA NA ...
##  $ Cloud3pm     : int  NA NA 2 NA 8 NA NA NA NA NA ...
##  $ Temp9am      : num  16.9 17.2 21 18.1 17.8 20.6 18.1 16.3 18.3 20.1 ...
##  $ Temp3pm      : num  21.8 24.3 23.2 26.5 29.7 28.9 24.6 25.5 30.2 28.2 ...
##  $ RainToday    : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 2 ...
##  $ RISK_MM      : num  0 0 0 1 0.2 0 0 0 1.4 0 ...
##  $ RainTomorrow : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 2 1 ...

We only disagree with two of the above variables’ classifications.


2.2.1 The Date Variable

str(weather$Date)
##  Factor w/ 3436 levels "2007-11-01","2007-11-02",..: 397 398 399 400 401 402 403 404 405 406 ...

The ‘Date’ variable should be expressed as a POSIXct (Portable Operating System Interface calendar time) Date object instead of being a factor with over 3000 levels.


Let’s format it as a Date object:

format_date = as.Date(weather$Date)
str(format_date)
##  Date[1:142193], format: "2008-12-01" "2008-12-02" "2008-12-03" "2008-12-04" "2008-12-05" ...


As a result, more useful information such as the day of the week and the name of the month can be extracted. Let’s look at on which days and months the first 6 observations occurred:

# Day of the week
head(format(format_date, "%A"))
## [1] "Monday"    "Tuesday"   "Wednesday" "Thursday"  "Friday"    "Saturday"
# Abbreviated month
head(format(format_date, "%b"))
## [1] "Dec" "Dec" "Dec" "Dec" "Dec" "Dec"


2.2.2 The RainToday & RainTomorrow Variables

str(weather$RainToday)
##  Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 2 ...
str(weather$RainTomorrow)
##  Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 2 1 ...

As seen above, RainToday and RainTomorrow are factors with two levels: “Yes” or “No”. However, this better expressed as a logical type (or Boolean, i.e. TRUE or FALSE).


Let’s format them such that the str() function outputs the following:

# Changing RainToday to a logical type
levels(weather$RainToday)[1] = FALSE
levels(weather$RainToday)[2] = TRUE
logi_rain_today = as.logical(weather$RainToday)
str(logi_rain_today)
##  logi [1:142193] FALSE FALSE FALSE FALSE FALSE FALSE ...
# Changing RainTomorrow to a logical type
levels(weather$RainTomorrow)[1] = FALSE
levels(weather$RainTomorrow)[2] = TRUE
logi_rain_tomorrow = as.logical(weather$RainTomorrow)
str(logi_rain_tomorrow)
##  logi [1:142193] FALSE FALSE FALSE FALSE FALSE FALSE ...


2.3 Initial Questions About the Data

What is the spread of each variable?

# Looking at the spread of the data
summary(weather)
##          Date            Location         MinTemp         MaxTemp     
##  2013-03-02:    49   Canberra:  3418   Min.   :-8.50   Min.   :-4.80  
##  2013-03-03:    49   Sydney  :  3337   1st Qu.: 7.60   1st Qu.:17.90  
##  2013-03-04:    49   Perth   :  3193   Median :12.00   Median :22.60  
##  2013-03-06:    49   Darwin  :  3192   Mean   :12.19   Mean   :23.23  
##  2013-03-07:    49   Hobart  :  3188   3rd Qu.:16.80   3rd Qu.:28.20  
##  2013-03-10:    49   Brisbane:  3161   Max.   :33.90   Max.   :48.10  
##  (Other)   :141899   (Other) :122704   NA's   :637     NA's   :322    
##     Rainfall       Evaporation        Sunshine      WindGustDir   
##  Min.   :  0.00   Min.   :  0.00   Min.   : 0.00   W      : 9780  
##  1st Qu.:  0.00   1st Qu.:  2.60   1st Qu.: 4.90   SE     : 9309  
##  Median :  0.00   Median :  4.80   Median : 8.50   E      : 9071  
##  Mean   :  2.35   Mean   :  5.47   Mean   : 7.62   N      : 9033  
##  3rd Qu.:  0.80   3rd Qu.:  7.40   3rd Qu.:10.60   SSE    : 8993  
##  Max.   :371.00   Max.   :145.00   Max.   :14.50   (Other):86677  
##  NA's   :1406     NA's   :60843    NA's   :67816   NA's   : 9330  
##  WindGustSpeed      WindDir9am      WindDir3pm     WindSpeed9am 
##  Min.   :  6.00   N      :11393   SE     :10663   Min.   :  0   
##  1st Qu.: 31.00   SE     : 9162   W      : 9911   1st Qu.:  7   
##  Median : 39.00   E      : 9024   S      : 9598   Median : 13   
##  Mean   : 39.98   SSE    : 8966   WSW    : 9329   Mean   : 14   
##  3rd Qu.: 48.00   NW     : 8552   SW     : 9182   3rd Qu.: 19   
##  Max.   :135.00   (Other):85083   (Other):89732   Max.   :130   
##  NA's   :9270     NA's   :10013   NA's   : 3778   NA's   :1348  
##   WindSpeed3pm    Humidity9am      Humidity3pm      Pressure9am    
##  Min.   : 0.00   Min.   :  0.00   Min.   :  0.00   Min.   : 980.5  
##  1st Qu.:13.00   1st Qu.: 57.00   1st Qu.: 37.00   1st Qu.:1012.9  
##  Median :19.00   Median : 70.00   Median : 52.00   Median :1017.6  
##  Mean   :18.64   Mean   : 68.84   Mean   : 51.48   Mean   :1017.7  
##  3rd Qu.:24.00   3rd Qu.: 83.00   3rd Qu.: 66.00   3rd Qu.:1022.4  
##  Max.   :87.00   Max.   :100.00   Max.   :100.00   Max.   :1041.0  
##  NA's   :2630    NA's   :1774     NA's   :3610     NA's   :14014   
##   Pressure3pm        Cloud9am        Cloud3pm        Temp9am     
##  Min.   : 977.1   Min.   :0.00    Min.   :0.0     Min.   :-7.20  
##  1st Qu.:1010.4   1st Qu.:1.00    1st Qu.:2.0     1st Qu.:12.30  
##  Median :1015.2   Median :5.00    Median :5.0     Median :16.70  
##  Mean   :1015.3   Mean   :4.44    Mean   :4.5     Mean   :16.99  
##  3rd Qu.:1020.0   3rd Qu.:7.00    3rd Qu.:7.0     3rd Qu.:21.60  
##  Max.   :1039.6   Max.   :9.00    Max.   :9.0     Max.   :40.20  
##  NA's   :13981    NA's   :53657   NA's   :57094   NA's   :904    
##     Temp3pm      RainToday         RISK_MM        RainTomorrow  
##  Min.   :-5.40   FALSE:109332   Min.   :  0.000   FALSE:110316  
##  1st Qu.:16.60   TRUE : 31455   1st Qu.:  0.000   TRUE : 31877  
##  Median :21.10   NA's :  1406   Median :  0.000                 
##  Mean   :21.69                  Mean   :  2.361                 
##  3rd Qu.:26.40                  3rd Qu.:  0.800                 
##  Max.   :46.70                  Max.   :371.000                 
##  NA's   :2726


How large is the data set?

# Looking at the dimensions of the data
dim(weather)
## [1] 142193     24

It contains 142 193 rows (observations) and 24 columns (variables).


Over what period of time does the data set span?

# Finding the initial and final weather observations
min(as.Date(weather$Date))
## [1] "2007-11-01"
max(as.Date(weather$Date))
## [1] "2017-06-25"

It was collected between November 2007 and June 2017.


Which locations are used? How many are there?

# Finding the names of each location, sorted in alphabetical order
sort(unique(weather$Location))
##  [1] Adelaide         Albany           Albury           AliceSprings    
##  [5] BadgerysCreek    Ballarat         Bendigo          Brisbane        
##  [9] Cairns           Canberra         Cobar            CoffsHarbour    
## [13] Dartmoor         Darwin           GoldCoast        Hobart          
## [17] Katherine        Launceston       Melbourne        MelbourneAirport
## [21] Mildura          Moree            MountGambier     MountGinini     
## [25] Newcastle        Nhil             NorahHead        NorfolkIsland   
## [29] Nuriootpa        PearceRAAF       Penrith          Perth           
## [33] PerthAirport     Portland         Richmond         Sale            
## [37] SalmonGums       Sydney           SydneyAirport    Townsville      
## [41] Tuggeranong      Uluru            WaggaWagga       Walpole         
## [45] Watsonia         Williamtown      Witchcliffe      Wollongong      
## [49] Woomera         
## 49 Levels: Adelaide Albany Albury AliceSprings BadgerysCreek ... Woomera

From above, 49 locations in Australia are used spanning from Adelaide to Woomera.


2.4 Source of the Data

The data was obtained from kaggle but it originates from the Australian Government Bureau of Meteorology’s website. It is a combination of two separate data sets on daily weather records and climate data.

Each row represents a new weather observation while each column represents the properties of the weather observations.


2.5 Possible issues with the data

The data was combined from two separate data sets; one recording daily observations and the other, climate data. Without knowing how the two were combined, the data’s validity comes into question.

In spite of this, the data’s origins in the Australian Government do suggest a high degree of validity.

Other possible issues include gaps in the table where a valid observation is not available due to confounding factors (such as a failure in observing equipment). These gaps are populated as NA’s, reducing the diversity of the data set.


2.6 Assesing Stakeholders

Possible stakeholders include:

  • Governments: Being informed on climate trends in order to institute environmental policy and anti-climate change measures.
  • Agriculture: Monitoring rain seasons and temperature and how it changes over time is important for crop growth and animal rearing.
  • Tourism: Large changes in the climate for some areas may make certain tourist activities unsuitable or undesirable.
  • Insurance: A change in climate for some areas may increase the insurance risk for financial companies and thus influence the pricing of insurance policies.
  • Individuals: In considering where they might live an individual may favour locations where they can generate enough renewable energy to subsist or perhaps enjoy a tropical climate.


2.7 Domain knowledge

Weather describes a combination of certain meteorological factors such as rainfall, temperature, humidity, wind speed, wind direction. While weather defines a short period of time, climate is used to describe the long term patterns in weather conditions for a certain region.

Climate and weather data is incredibly important and impactful on a wide range of industries including agriculture, tourism, and renewable energy. By observing climate data over periods of time, we can analyse trends and predict future climate behaviour. We can then, subsequently, apply this research to specific industries in order to optimise output efficiency.



3 Research Questions

3.1 Where is the most optimal location for agricultural production in Australia?

What makes a location good for agricultural production?

  • Consistent rainfall
  • Plenty of sunshine
  • Protection from natural disasters
# Creating a data set that summarises each location by its chance of receiving rainfall
percent_rain_data = weather %>% 
  group_by(Location) %>% 
  summarise(percent_rain = mean(RainToday == "TRUE", na.rm = TRUE)*100)

# Bar plot to show the chance of a rainy day in each location
chance_rain = plot_ly(percent_rain_data, x = ~reorder(Location,-percent_rain), y = ~percent_rain, type = "bar", color = I("rgba(0,128,128,0.9)")) %>%
  layout(title = "Chance of a Rainy Day Across Australia", xaxis = list(title="Location"), yaxis = list(title="Percentage (%)"))
chance_rain

From the bar plot above, the top five most consistent locations for rainfall in Australia appear to be:

  1. Portland
  2. Walpole
  3. Cairns
  4. Dartmoor
  5. Northfolk Island


However, we should look at the amount of rainfall received by these locations on rainy days. This is represented below:

# Creating a data set that summarises each location by its average rainfall on rainy days
mean_rain_data = weather %>%
  group_by(Location) %>%
  filter(Rainfall > 0) %>%
  summarise(mean_rainfall = mean(Rainfall))

# Bar plot to show the average rainfall in each location on rainy days
mean_rain = plot_ly(mean_rain_data, x = ~reorder(Location, -mean_rainfall), y = ~mean_rainfall, type ="bar", color = I("rgba(0,128,128,0.9)")) %>%
  layout(title = "Average Rainfall on Rainy Days Across Australia", xaxis = list(title = "Location"), yaxis = list(title = "Mean Rainfall (mm)"))
mean_rain
# Creating a data set that summaries each location by its median rainfall on rainy days
median_rain_data = weather %>%
  group_by(Location) %>%
  filter(Rainfall > 0) %>%
  summarise(median_rainfall = median(Rainfall))

# Bar plot to show the median rainfall in each location on rainy days
median_rain = plot_ly(median_rain_data, x = ~reorder(Location, -median_rainfall), y = ~median_rainfall, type = "bar", color = I('rgba(0,128,128,0.9)')) %>%
  layout(title = "Median Rainfall on Rainy Days Across Australia", xaxis = list(title = "Location"), yaxis = list(title = "Median Rainfall (mm)"))
median_rain

According to the two bar plots above, three locations consistently appear in the top five locations for both their mean and median rainfall. These are:

  1. Darwin
  2. Katherine
  3. Cairns

However, it appears that although Katherine receives high rainfall on rainy days, it only rains on average for 17% of days each year. Thus, it is highly inconsistent and cannot be regarded as an ideal location for farming.

On the other hand Darwin rains on average 27% of days each year with a median rainfall of 7.4mm, making it very a consistent location for high rainfall. Furthermore, Cairns appears from out previous bar plot, ranking as the 3rd most consistent location for rainfall at 32% of days each year on average.

Yet, an optimal location for agricultural production also requires plenty of sunshine:

# Box plot to show the number of hours of bright sunshine by location
sunshine = plot_ly(weather, x = ~Sunshine, y = ~Location, type = "box", color = ~Location, marker = list(size = 5, opacity = 0.2)) %>%
  layout(title = 'Sunshine Across Australia From 2007-2017', yaxis = list(title = 'Locations', autorange = TRUE, categoryorder = "category descending", title = "Locations"), xaxis = list(title = 'Sunshine (Hours per Day)'))
sunshine

From the box plot representing hours of sunshine across Australia, notice that Darwin has a relatively high median hours of sunshine a day at 10 hours, and a reasonably low IQR (Interquartile Range) of 4 hours, indicating a consistent large number of hours of sunshine per day. Furthermore, Cairns also has a high median number of hours of sunshine per day at 8.6 hours, alongside an adequate IQR of 5.5 hours.

This would further suggest that Darwin and Cairns would be very optimal locations for agriculture in Australia, and indeed perhaps the most favourable.

However, other factors must also be considered. One such factor would be the built-up nature of Darwin’s topography and the subsequent lack of available free land for agriculture. Another factor is the possibility of flash flooding, evident in Darwin with one day in 2011 receiving 367.6mm of rain, the 2nd highest amount of rainfall in one day in Australia over the last 10 years.

Similarly, Cairns must also be assessed more deeply. Like most of North Queensland, Cairns is prone to tropical cyclones which, again, would heavily influence the decision of whether or not to implement agricultural endeavours in the region.


3.2 How does temperature vary in different climates across Australia?

To best address this question, we shall analyse the data obtained from Sydney, Alice Springs and Darwin since they represent significantly different geographical locations across Australia: temperate sub-tropical, hot desert and tropical savanna climates respectively.

# Creating a data set that splits the date column into year, month and day
date_split_data = weather %>%
  tidyr::separate(col = Date,
                  into = c("year", "month", "day"),
                  sep = "-")

# Adding a column for the average temperature
date_split_data = mutate(date_split_data, mean_temp = (MaxTemp+MinTemp)/2)

# Adding a column for the day of the week
day_name = format(as.Date(weather$Date), "%A")
date_split_data = cbind(date_split_data, day_name)

# Added a column for the month name as a factor
month_name = format(as.Date(weather$Date), "%B")
date_split_data = cbind(date_split_data, month_name)

# Reordering the columns
date_split_data = date_split_data[, c(1, 2, 29, 3, 28, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)]

# Ordering the month coloumn
date_split_data$month_name = factor(date_split_data$month_name, c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"))



# Creating a data set that summarises each location by its average monthly temperature 
mean_temp_month_data = date_split_data %>%
  group_by(Location, year, month_name) %>%
  summarise(mean_temp_month = mean(mean_temp, na.rm = "True"))



# Line chart to show fluctuations in the average monthly temperature each year for Sydney
mean_temp_sydney = plot_ly(mean_temp_month_data, x = ~month_name, y = ~mean_temp_month, color = ~year, text = ~Location, hoverinfo = "text") %>%
  filter(Location == "Sydney") %>%
  add_trace(type = "scatter", mode = "lines", line = list(shape = "spline"), colors = c("#edf8b1", "#7fcdbb", "#2c7fb8")) %>%
  layout(title = "Average Temperature in Sydney Per Month", xaxis = list(title = "Month", range = c(0,11)), yaxis = list(title = "Mean Temperature (°C)", range = c(0,35)))
mean_temp_sydney

Sydney is a temperate sub-tropical climate.

According to the line chart above these are characterised by a gradual change in temperature throughout the year from 13-25°C rather than extreme seasonal differences. This temperature moderation is likely due to Sydney’s proximity to the ocean. In fact, over time this trend appears highly consistent with minor fluctuations between years.

Indeed a temperate sub-tropical climate is known to exhibit a gradual shift between mild winters and warm summers, with the shape of the annual temperature graph indicating four distinct seasons.

# Line chart to show fluctuations in the average monthly temperature each year for Alice Springs
mean_temp_alicesprings = plot_ly(mean_temp_month_data, x = ~month_name, y = ~mean_temp_month, color = ~year, text = ~Location, hoverinfo = "text") %>%
  filter(Location == "AliceSprings") %>%
  add_trace(type = "scatter", mode = "lines", line = list(shape = "spline"), colors = c("#ffeda0", "#feb24c", "#f03b20")) %>%
  layout(title = "Average Temperature in Alice Springs Per Month", xaxis = list(title = "Month", range = c(0,11)), yaxis = list(title = "Mean Temperature (°C)", range = c(0,35)))
mean_temp_alicesprings

Alice Springs is a hot desert climate

From the line chart above, this climate appears to be identified by high average temperatures in summer, and low temperatures in winter. Indeed a greater spread in temperature values from 9-30°C reflects this characteristic.

In fact, a hot desert climate portrays such a temperature-time graph, portraying a four-season trend with significant seasonal differences and a hence a steeper curve.

# Line chart to show fluctuations in the average monthly temperature each year for Darwin
mean_temp_darwin = plot_ly(mean_temp_month_data, x = ~month_name, y = ~mean_temp_month, color = ~year, text = ~Location, hoverinfo = "text") %>%
  filter(Location == "Darwin") %>%
  add_trace(type = "scatter", mode = "lines", line = list(shape = "spline"), colors = c("#e7e1ef", "#c994c7", "#dd1c77")) %>%
  layout(title = "Average Temperature in Darwin Per Month", xaxis = list(title = "Month", range = c(0,11)), yaxis = list(title = "Mean Temperature (°C)", range = c(0,35)))
mean_temp_darwin

Darwin possesses a tropical savanna climate.

According to the line chart above, this type of climate appears to have much less distinct seasons. This is evident in the very high average temperature that persists throughout the year for Darwin with little variation, only ranging from 23-30°C.

In fact, instead of having four distinct seasons, a tropical savanna climate has distinct wet and dry seasons.

This characteristic is evident below:

# Changing RainToday to logical type
levels(date_split_data$RainToday)[1] = FALSE
levels(date_split_data$RainToday)[2] = TRUE
date_split_data = date_split_data %>%
  mutate(RainToday, RainToday = as.logical(RainToday))

# Changing RainTomorrow to logical type
levels(weather$RainTomorrow)[1] = FALSE
date_split_data = date_split_data %>%
  mutate(RainTomorrow, RainTomorrow = as.logical(RainTomorrow))

# Creating a data set
daily_summary = date_split_data %>%
  group_by(Location, year, month_name) %>%
  summarise(mean_daily_rain = mean(Rainfall, na.rm=TRUE), 
            median_daily_rain = median(Rainfall, na.rm = TRUE), 
            total_monthly_rain = sum(Rainfall, na.rm = TRUE),
            max_daily_rain = max(Rainfall, na.rm=TRUE),
            min_daily_rain = min(Rainfall, na.rm=TRUE),
            mean_max_temp = mean(MaxTemp, na.rm=TRUE),
            mean_min_temp = mean(MinTemp, na.rm=TRUE),
            median_max_temp = median(MaxTemp, na.rm=TRUE),
            median_min_temp = median(MinTemp, na.rm=TRUE),
            max_daily_temp = max(MaxTemp, na.rm=TRUE),
            min_daily_temp = min(MinTemp, na.rm=TRUE),
            median_con_rain = median(rle(RainToday)$lengths[rle(RainToday)$values==TRUE], na.rm=TRUE))


# Bar plot of the mean rainfall for each month in Darwin
darwin_weather = subset(daily_summary, Location == "Darwin")
ggplot(darwin_weather, aes(x = month_name, y = mean_daily_rain, fill = year)) + geom_bar(stat = "identity") + ggtitle("Average Rainfall per Month in Darwin") + xlab("Month") + ylab("Rainfall (mm)") + theme(axis.text.x = element_text(angle = 45, hjust = 1))


3.3 Where is the ideal location for generating renewable energy in Australia?

# Creating a data set that adds a column for the average wind speed in each observation
mean_wind_speed_data = weather %>%
  mutate(mean_wind_speed = (WindSpeed3pm + WindSpeed9am)/2)

# Box plot to show the average wind speed by location
wind_speed = plot_ly(mean_wind_speed_data, y = ~mean_wind_speed, x = ~Location, type = "box", color = ~Location, marker = list(size = 3, opacity = 0.9)) %>%
  layout(title = "Average Wind Speed Across Australia From 2007-2017", yaxis = list(title = "Wind Speed (km/hr)"), xaxis = list(title = "Locations"))
wind_speed

Despite the consistency of wind speed across locations, based on the comparative box plot above, through considering the median and IQR it appears that the most ideal locations for generating wind powered energy would be:

  1. Sydney Airport
  2. Melbourne
  3. Woomera
  4. Darwin
  5. Norfolk Island

However, for Sydney Airport it is in practicality unrealistic to develop a system of wind turbines and the like in such a congested location. Hence we shall disregard these locations.

Similarly, we can also disregard Mount Gamier and Norfolk Island due to the technical challenge of setting up a wind turbine system on a tall mountain and a small island respectively.

Melbourne, Woomera and Darwin still seem appealing locations, yet Woomera would indeed have a lot more free land to construct wind turbines on.

Now let’s consider sunshine as well:

# Creating a data set that summarises each location by its median sunshine and wind speed
median_sunshine_and_wind_speed_data = mean_wind_speed_data %>%
  group_by(Location) %>%
  summarise(median_sunshine = median(Sunshine, na.rm = "True"), median_wind_speed = median(mean_wind_speed, na.rm = "True"))

# Scatter plot to show the number of median hours of bright sunshine against median wind gust speed
sunshine_wind_speed = plot_ly(median_sunshine_and_wind_speed_data, x =~median_wind_speed, y = ~median_sunshine, type = "scatter", mode = "markers", color = ~Location) %>%
  layout(title = 'Median Hours of Sunshine VS Median Wind Speed', yaxis = list(title = 'Median Hours of Sunshine per Day', autorange = TRUE, categoryorder = "category descending", title = "Locations"), xaxis = list(title = 'Median Wind Speed (km/h)'))
sunshine_wind_speed

According to the scatter plot above, Woomera is the most ideal location for generating renewable energy with 10 median hours of sunshine per day and a median wind speed of 19.5 km/h.

In fact, despite having a median wind speed of 18.5 km/h, Melbourne appears to have only 6.7 median hours of sunshine per day, revealing its lacking potential for reaping solar energy from the Sun.

Yet, Darwin also seems quite ideal, with a median wind speed of 17.5 km/h and 10 median hours of sunshine per day. In fact, it has a much smaller IQR of 6.5 km/h in comparison to 10.5 km/h for Woomera.


Thus, considering all the factors, it appears that Woomera and Darwin are the most ideal locations for generating renewable energy in Australia.


4 Session Info

sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
## [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
## [5] LC_TIME=English_Australia.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] psych_1.8.12     caret_6.0-81     lattice_0.20-38  plotly_4.8.0    
##  [5] forcats_0.4.0    stringr_1.4.0    dplyr_0.8.0.1    purrr_0.3.1     
##  [9] readr_1.3.1      tidyr_0.8.3      tibble_2.0.1     ggplot2_3.1.0   
## [13] tidyverse_1.2.1  kableExtra_1.0.1 magrittr_1.5     knitr_1.21      
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.0         jsonlite_1.6       viridisLite_0.3.0 
##  [4] splines_3.5.2      foreach_1.4.4      prodlim_2018.04.18
##  [7] modelr_0.1.4       shiny_1.2.0        assertthat_0.2.0  
## [10] highr_0.7          stats4_3.5.2       cellranger_1.1.0  
## [13] yaml_2.2.0         ipred_0.9-8        pillar_1.3.1      
## [16] backports_1.1.3    glue_1.3.1         digest_0.6.18     
## [19] RColorBrewer_1.1-2 promises_1.0.1     rvest_0.3.2       
## [22] colorspace_1.4-0   recipes_0.1.4      httpuv_1.4.5.1    
## [25] htmltools_0.3.6    Matrix_1.2-15      plyr_1.8.4        
## [28] timeDate_3043.102  pkgconfig_2.0.2    broom_0.5.1       
## [31] haven_2.1.0        xtable_1.8-3       scales_1.0.0      
## [34] webshot_0.5.1      later_0.8.0        gower_0.2.0       
## [37] lava_1.6.5         generics_0.0.2     withr_2.1.2       
## [40] nnet_7.3-12        lazyeval_0.2.1     cli_1.0.1         
## [43] mnormt_1.5-5       mime_0.6           survival_2.43-3   
## [46] crayon_1.3.4       readxl_1.3.1       evaluate_0.13     
## [49] nlme_3.1-137       MASS_7.3-51.1      foreign_0.8-71    
## [52] xml2_1.2.0         class_7.3-14       tools_3.5.2       
## [55] data.table_1.12.0  hms_0.4.2          munsell_0.5.0     
## [58] compiler_3.5.2     rlang_0.3.1        grid_3.5.2        
## [61] iterators_1.0.10   rstudioapi_0.9.0   htmlwidgets_1.3   
## [64] crosstalk_1.0.0    labeling_0.3       rmarkdown_1.11    
## [67] gtable_0.2.0       ModelMetrics_1.2.2 codetools_0.2-15  
## [70] reshape2_1.4.3     R6_2.4.0           lubridate_1.7.4   
## [73] stringi_1.4.3      parallel_3.5.2     Rcpp_1.0.0        
## [76] rpart_4.1-13       tidyselect_0.2.5   xfun_0.5


5 References

Kaggle.com. (2019). Rain in Australia. [online] Available at: https://www.kaggle.com/jsphyg/weather-dataset-rattle-package [Accessed 13 Mar. 2019].

Bom.gov.au. (2019). Climate Data Online. [online] Available at: http://www.bom.gov.au/climate/data/?fbclid=IwAR2CLU4ge5DcxbXfRBPA0hshBijbCXu6oir2B7hNZAL5WMSY0SGlIeXzklI [Accessed 13 Mar. 2019].

Weatheronline.co.uk. (2019). Climate of the World: Australia | weatheronline.co.uk. [online] Available at: https://www.weatheronline.co.uk/reports/climate/Australia.htm [Accessed 13 Mar. 2019].

Colorbrewer2.org. (2019). ColorBrewer: Color Advice for Maps. [online] Available at: http://colorbrewer2.org/?fbclid=IwAR1v0BXFZsss_fEZ0TaI74MOarltAPJZWz-KivgKQp7CiGaUeQc7J-piFkE#type=qualitative&scheme=Set1&n=3 [Accessed 20 Mar. 2019].

19january2017snapshot.epa.gov. (2019). Climate Impacts on Agriculture and Food Supply | Climate Change Impacts | US EPA. [online] Available at: https://19january2017snapshot.epa.gov/climate-impacts/climate-impacts-agriculture-and-food-supply_.html [Accessed 16 Mar. 2019].